Artificial intelligence dialects of the Bayesian belief revision language

نویسندگان

  • Shimon Schocken
  • Paul R. Kleindorfer
چکیده

Rule-based expert systems must deal with uncertain data, subjective expert opinions, and inaccurate decision rules. Computer scientists and psychologists have proposed and implemented a number of belief and new insights into theu Bayesian interpretations are presented In particular, the authors focus on three alternative belief-update models the certainty factors calculus, Dempster-Shafer simple support functions, d the descriptive contrast/inettia model. Important "dialectsn of these LPk guages are shown to be isomorphic to each other and to a speeiel asp d Bayesian inference. Parts of this analysis were d e d out by otkr authors; these results were extended and technique designed to study the bd I. BELIEF LANGUAGES IN ARTIFICIAL INTELLIGENCE D URING the past decade, computer-based expert systems have emerged to become the most applied facet of artificial intelligence (AI). To date, expert systems have proven to be particularly effective in &$emm~c tasks, eg antimicrobial selection and interpretation of geological data (Duda and Shortliffe 1151). Such "domains of expertise" are typically characterized by uncertain field data, subjective expert opinions, and inexact decision rules. n e challenge of dealing with these uncertainties has stirred o lively debate among developers of expert systems on the one hand, and faithful followers of the Bayesiari religion on the other. First, the drive to experiment with real expert systems has led A1 researchers to implement a d h uncertainty mechanisms that are rather limited on normative grounds. This, in turn, has drawn criticism from Bayesian writers who, nonetheless, were forced to admit that the classical methods they preached did not always scale up to realistic applications. As the reference section of this paper indicates, the result was an inspiring exchange of ideas that is currently going strong in many academic circles in computer science, statistics, and psychology. Manuscript received August 29, 1988; revised January 29, 1989. S. Schocken is with the Leonard N. Stern School of Business, New York University, 624 Tisch Hall, Washington Square, New York, NY 10003. P. R. Kleindorfer is with the Wharton School, Suite 1150, Steinberg Hall, Dietrich Hall, University of Pennsylvania, Philadelphia, PA 19104. IEEE Log Number 8929617. Just like any other formal model, an expert system imposes a rigid structure on the problem it attempts to support. In particular, rule-based systems make the assumption that expertise can be captured through a moduka SeiB of hderm~~: rules. These rules are supposed to "objective" howledge as well as subjective expert opinions. To illustrate, consider the following familiar problem. A tenured professor (hereafter referred to as a "recruiter'') attempts to guess the academic potential of a candidate to a junior faculty position. The information available to the recruiter is the typical mix of resume, papers, and recommendaiion letters, along with his own pa% recruiting exweace. The review process is complicated by the fact that many young Ph.D.'s are competing for the same slot; therefore, the god of the nxruiter is to rank-order the candidates in terms of their prospective academic potential. The overall criterion for academic success is taken to be the perceived likelihood of the prowtion, "the candidate will be offered a tenured position in our department within the next decade." A recruiter is said to be an "expert" if his predictions consistentIy exhibit a great deal of external validity. To make an extreme case, let us assume that our recruiter is a perfect predictor: all the candidates that he has recommended In the past were subsequently offered tenure, and all the people that he has rejected were refused tenure in similar departments. F~rthennore, our expert is willing to describe his proven recruiting rationale in terms of a set of inference rules. These rules represent, in his mind, the perceived academic significance of various credentials. Given this and other relevant information, is there a plausible model that can credibly synthesize this rule-base into a belief about the academic potential of a particular candidate? Can this model be further implemented in a computer-based system designed to carry out routine screening of candidates? These questions must be addressed by developers and users of rule-based expert systems. For the sake of simplicity, we assume that the credentials of prospective candidates can be enumerated through a finite set of dichotomous propositions { e , ; , en) . For example, e , might say that the candidate has an undergraduate degree in mathematics, and e, that he has extensive consulting experience. In a rule-based system, these clues are related to various hypotheses through a set of 0018-9472/89/0900-1106$01.00 01989 IEEE I SCHOCKEN AND KLEINDORFER: ART'lFfClAL I W G E N C E DULECrS OF BAniIAM BELIEF aFYlSION LANGUAGE 1107 rules, elicited from a recruiting expert. For example, our recruiter might suggest that, more often than not, a math degree e improves tenure prospects h. This inexact implication might be represented as I if e then h with degree of belief be1 ( h , ( e ) ) . The meaning of the quantitative degree of belief bel(h, { e)) in this logical context is an open question, tracing back to Carnap's work on inductive logic [7]. It is tempting to give these degrees of belief a probabilistic interpretation; nonetheless, scores of writers have demonstrated that logic and probability do not mix very well. This difficulty, combined with the presswe to develop rule-based s p i m s that can handle uncertainty, has led to a proliferation of incompatible quasi-probabilistic belief ha In this paper we review and analyze these languages %Hm a Bayesian perspective. Our genera4 qptoacfm is as fa@om: given a particular belief languqe, we wish to address three questions, related to the syntax, calculus, and semantics, I respectively, of the underlying language. 1) What is the Bayesian interpretation of the number bel(h, {e))? 2) Given a set of degrees of belief bel(h, {elf), . , be1 ( h , {e,j) aed a belief eonabinatioa mIe C(bel(h, {e,J);..,bel(h, {e,))), what is the Bayesian interpretation of this rule? 3) Do the answers to 1) and 2) place any constrains on the scope of inference problems that rule-based systems can credibly wive? It goes without saying that proper knowledge engineering requires that degrees of belief be elicited, computed, and interpreted in a credible manner. Beside the normative sigruficance of this objective, degrees of belief should be handled cautiously since the plausibility of tbe final set of hypotheses generated by an expefi system is iypkdy a function of the algebraic ranking of their posterior degrees of belief. For example, the diae0fi.s program internist (Pople [43]) uses the disease with the highest ranking degree of belief as an anchor, around which a fuzzy subset of plausible hypotheses is defined. Thus, both the external as well as the internal validity of an expert system are directly related to the validity of its underlying belief language. Nonetheless, it seems by now that no single belief language exists that will be effective, efficient, and plausible for all possible applications and, at the same time, satisfy all knowledge engineers, experts, users, and researchers. This realization has led to a proliferation of new belief languages in the last decade. At the foundation of any of these languages lies either a descriptive or a normative argument about human judgement under uncertainty. A descriptive language attempts to capture the way experts actually reason; hence its measure of performance is its capability of simulating human judgment. In contrast, a normative model is based on the premise that human reasoning under uncertainty is often suboptimal; rather than attempting to replicate it blindly, the normative approach is more concerned with the proximity of the system's judgment to such rigorous standards as logic or probability theory. It is well-known by now that normative models are not necessarily consistent with descriptive models, as has been demonstrated by numerous studies in cognitive psychology and in decision sciences. For a good discussion of this dichotomy, the reader is referred to Baron [4]. The classical method for representing and updating degrees of belief is the Bayesian language, which is consistent with the axioms of subjective probability. Researchers who attempted to implement this method in expert systems, however, were quick to discover three major limitations: a) standard manipulations of discrete joint distribution fmdons are coqutationaIIy complex (Pearl [a]); b) the Bayesian lanmge does not lend itself easily to reprm* ambiguity and fuzzy expert opinions (Shafer [50])'; and c) human reasoning under uncertainty is systematically inconsistent with Bayesian inference (Tversky and Kahneman [57]). Efforts to curb the inherent limitations of a complete Bayesian design have taken several directions. Early expert systems such as MYCIN (Shortliffe [53]) and Prospector (b& et al. [14]) employed ad hoe belief languages based on certainty factors and subjective likelihood ratios, respectively. The resulting belief-update mechanisms were only partially consistent with the axioms of subjective probability. These "first generation" systems gave way to a renewed interest in probability theory, led by Howard's [28] work on influence diagrams and Pearl's [40] belief networks. Pearl's methods of propagating probabilities through a network of propositions are consistent with standard probability theory. In the case of singly connected networks, the run-the of Pearl's algorithm is polinomial with the siaR of the network. The problem of probabilistic belief-update in a general network has been shown to be NP-hard (Copper [Ill). Perhaps the most important development in the quest to "extend" the Bayesian language is Shafer's work on belief functions and the corresponding Dempster rule for combining them (Shafer [50]). The theory of belief functions has a rigorous mathematical foundation based on a relaxation of the additivity axiom of probability theory. The resulting Dempster-Shafer language provides explicit tools for dealing with ambiguity and "uncommitted belief." There has recently been a surge of interest in the Dempster-Shafer language within the A1 community, and a considerable number of expert systems and expert system shells already employ this technique (e.g., Baldwin and Monk [3]). Finally, there have been several attempts in decision sciences and cognitive psychology to specify descriptive or '?here have been attempts to deal with second-order probability within the Bayesian lwuage, e.g., Pearl (411, Baron IS], and Kyburg [31]. "behavioral" belief languages. For example, Einhorn and which has three parts. First, a brief overview of the IanHogarth [19] proposed a pragmatic anchoring and adjustguage is given. Second, a Bayesian interpretation is prement belief-update model called the contrast/inertia lansented and justified. Third, the mathematical implications guage. Such experimental works are typically based on of the interpretation are discussed. A discussion section trying to fit a mathematical model that best explains the integrates the preceding results and suggests future rebehavior of subjects in a controlled experiment. To the search directions. extent that this empirical effort can be extended to model the reasoning of successful experts rather than naive sub. jects, this approach clearly merits attention from practi11. RULE-BASED INFERENCE UNDER UNCERTAINTY tioners of expert systems. In the 39743'~~ the deveIopment of new belief languages was carried out. primarily by people outside the mainstream of the statistics comunity. 7%is b e of research went by and large unnoticed until the 1980's. Sice then, a number of faithful B s have begun to defend their turf by arguing that wefy new M d I m b a : matched with a plausible probabistic intepretatim. As Kyburg [31] puts it, "it is appropriate to examine the formal relations between various Bayesian and non-Bayesian approaches.. . in order to explore the question of whether the new techniques are really more powerful than the old, and the question sf whether> if they are, this increment of power is bought at too high a price." The present work is motivated by this b e of though^ We wilf review previous work and present new results on the Bayesian interpretations of important versions of the certainty factors, contrast [inertia, and Dempster-Shafer languages. The paper draws on previous work by Heckennan [26], Hajek [251, md Grosof [23]. Most of what we cunaGy know about the normative validity of the certainty factors language is due to Heckennan's interpretation of certainty factors as transformed likelihood ratios. Hajek has also investigated the normative validity of alternative belief revision calculi, but his point of departure was not probabilistic at all. He proposed an axiomatic calculus consis* of four plausible combination b c t i o m and went on to analyze the algebraic structure 02 the set of degrees af belief i n d u d hry these functions. This analysis led to his condusion that ehe certainty factors language and the ad hoc Bayesian language used in Prospector induce isomorphic sets of degrees of belief. This isomorphism, however, was based on nonprobabilistic algebraic mappings. In this paper we take a different approach, as follows. The paper commences with a review of rule-based inference, the context in which belief languages are used in artificial intelligence. A Bayesian language is then presented, and its underlying (extraprobabilistic) rationality is demonstrated. This is done to motivate our choice of the Bayesian language, as opposed to another desiredata (e.g., Hajek's), as a standard against which other belief languages will be compared. We then present an analytic methodology, which, lacking a better name, is termed dialectal analysis. This set of tools, designed to investigate the kinship to two or more belief language, is then used to describe the implicit relationships among the certainty The two major building blocks of an expert system are the knowledge base and the inference engine. The knowledge-buse is conceptually a directed graph consisting of propositional nodes and inferential arcs. The boundary of the graph consists of a set of competing hypotheses (e.g., diseases in a medical diagnosis application) and a set of observable cfuaq (e-g, diagnostic symptoms). Inner nodes represent subgptheses (e.g., clinical syndromes). The directed arc connecting nodes e and h represents a direct inferential relationship between e and h, and the arc's label is the strength of this relationship, or the degree of belief, be1 (h, e).2 The inference engine is a search algorithm that prunes this evidential network and applies modus ponens repetitively. One difference between this and standard theorem proving stems from the uncertainty associated with rules: as the inference engine prunes rules that ultimately imply a hypothesis, a belief calculus is applied to update the posterior belief in this hypothesis. This noncategorical reasoning process terminates when the belief in one or more hypotheses exceeds a certain predefined cutoff value. At least in theory, this value should be based on the cost of gatheririg additional evidence and on the consequences of committing type I and I1 errors. The preceding paragraph emphasizes the central role that belief calculi play in noncategorical rule-based inferearn. A m r b g to Shafer and Tversky [51], the building blacks of a belief language are syntax, calculus, and semantics. In the context of this paper, we define syntax to be the set of all degrees of belief that are relevant to a particular 21pt;hesis R. Typically, a set of atomic degrees of belief bel(h, {e,)); -, bel(h, (e,3) is elicited directly from a human expert, while compound degrees of belief are computed ad hoc through a set of operators collectively known as a belief calculus. A completely specified rule-based belief calculus must consist of five combination functions: parellel combinatioxi (combining the degrees of belief rendered by two or more independent rules), sequential combination (combining the rule's degeree of belief with the uncertainty associated with the rule's antecedent), and three logical combination functions (for negations, &junctions and conjunctions of uncertain pieces of evidence). In this paper we focus on parallel combination only. With regard to semantics, we take the position that the semantics of a particular belief language is given in terms factors, contrast/inertia, and ~em~s te r -~ha fe r languages. Each of these languages is discussed in a separate section 'From now on, the notation bel(h,e) is shorthand for bel(h,(e)). SCHOCREN AND KLEINDORFER: ARTlFlCIAL INIELUGENCE DULECrS OF BAYESIAN BELIEF REVISION Lr..,uniGE 1109 of a mapping from the syntax and calculus dimensions of the language onto the theory of subjective probability. To the extent that the latter theory is taken to be a norm for rationality, this mapping provides normative validity to the syntax and calculus dimensions of the belief langauge in question. It is reasonable to assume that any rule-based belief calculus with be founded on some parameterized variant of the following model: The single-place degree of belief bel(h), which is a shorthand of W ( h , ?B), ~epreseats prior belief, i.e., unwnditional belief k a hypa evidence. For example, belief in the tenure p ment) of a random dr Ph.D's. The degree of belief bel(h, e,) measures the degree of support that the clue e, renders to the hypothesis h. Depending on the underlying belief language, this evidential relationship might take two forms. Under a diagnostic mode of inference, bel(h, e) parameterizes the rul THEN h) and rep the expert's belief in the lik of h in light of the supporting evidence e. Under a na~sal interpretation, bel(h., e ) paramet* the ruJe (IF h THEN e), representing the degree of belief in the effect e occurring given its retrospective cause h. The direction of inference in rules varies across belief languages, and we therefore leave it unspecified at the abstract level of (1). Finally, given a set of evidence E = {e,; . . , en), the posterior belief in h in E&t s f E Is ted by the belief synthesis operator C. Depending on the properties of C, it is some&bes possible to M:VJB~& be1 recursively, making k an anchoringand-adjustment belief-update model: Focusing on some intuitive properties of belief-update, we might require that belbe commutative, i.e., and associative, i.e., These properties are quite plausible on rational grounds; we wouldn't like a physician to change his diagnosis simply because the order by which information is presented to him is altered. In general, one is free to construct an axiomatic desiderata regarding the intuitive properties of bel, using it further to evaluate formal belief languages. Such an approach was undertaken by Cox [12], Savage 1471, Popper [#I, and, most recently, Horvitz et al, [27]. For example, Cox ent~rnerated seven intuitive properties of belief-update and proceeded to prove that the resulting belfunction is a probability. By augmenting Cox's framework with three additional intuitive properties, Horvitz et al. have shown that any belief-update measure that satisfies the extended fr~mework must be equal to some monotonic transformation of a likelihood-ratio. Note that the belief-update model (1) is neither "good" in any philosophical sense, nor does it reflect any plausible desiderata. At the same time, the general form of (1) is largely dictated by the rule-based architecture, which assumes that a) wholistic expert knowledge can be decomposed into a finite series of discrete observations (rules), d b) fbag. subsets of this %nowledge-base can be synthesized or "rolled back" ints posterior beliefs. Suppose that we accept a) and b) as reasonable restrictions on the subset on inference problems to be studied. Can we define this subset more precisely? is it true that different belief languages are capable of modeling different subsets of inference problems? These questions will be addressed in what fsltows. The term inference problem introduced in the last paragraph refers to an ordered set of propositions, say (h, el,. -, en), in which h and {el; . -, en} are interpreted as a hypothesis and relevant body of evidence, respectively. Given this terminology, the "solution" of an inference problem refers to the correct Bayesian computation of the posterior belief in h in light of {el,. . , en ). From a probabilistic standpoint, { h, el,. -, en) is viewed as a space of dichotsmous random variables, characterized by a joint distribution function P: W , [O, 11 where W is the set of 2"+1 b d w pernutations defied over the space. Under this Bayesian interpretation, the ultimate goal of the inference process is to wmpute the posterior probability P(h)el,. .,en). It is easy to see, however, that any bruteforce attempt to compute this conditional probability from the joint distribution function P(h, el; . -, en) is bound to be exponential in n. The preceding paragraph assumes that P is known. In reality, this is clearly not the case. For example, the inference problem (h, el,. -, en)' might represent the inferential relationships that exist between, say, a hidden oil deposit h and its geological manifestations {el,. , en). Clearly, complete knowledge of the joint distribution function P(*) will rarely be credibly available and P(.) will have to be generated piecemeal using partial data and subjective expert judgment. At the same time, the elusive P might serve as a mechanism for defining classes of inference problems which vary in terms of their computational complexity. This is motivated by the notion that the cognitive complexity of an inference problem has something to do with the mathematical modularity of its underlying P. With that in mind, the following P characterization of inference problems is of special interest, as will be seen 1110 IEJ3 TRANSAC~ONS ON SY!3EMS, MAN, AND CYBERNETICS, VOL. 19, NO. 5, S E F T E ~ ~ E R / ~ O B E R 1989 shortly: Ratio-Form Conditionally Independent Problems: An inference problem (h, el;. ., en) is said to be ratio-form conditionally independent (Grosof (231) if Many writers have haphazardly read (3) to imply that 5,; . ., en are conditionally independent given both h and h.3 Clearly, (3) reflects a weaker notion of modularity which is neeessaq but not sufficient for the latter assertion. We now turn to describe a we11-horn version of the Bayesian language that is limited to ratio-form conditionally independent problems. This language is used as a standard against which other belief languages will be compared. This strategy (as opposed to formulating a desiredata) is based on the premise that the Bayesian beliefupdate model has extraprob tena~& of rationality that cannot be debated by a reasonable person. We bpe that the following section will help to convince skeptical readers that this is indeed true. Before delving into this discussion, we take the liberty of confining the analysis to inference problems (h, {el,e2}) consisting of one hypothesis and two pieces of evidence. All the findings reported here can be easily extended to any finite number of pieces of evidence, as long as they are all directly connected to the single hypoahesis in question. The inclusion of intermediate hypotheses tational problems which normally require heuristics and/or very restrictive assumptions on the underlying joint distribution function. Bayesian inference in complex networks is a challenging and active research area and the interested reader is referred to Chee er /33), Cooper [la], Pearl [40), and Shenoy and W%;r [521. Degrees of belief in the ratio-form Bayesian language are expressed in terms of odds and likelihood ratios: the posterior belief in a hypothesis h in light of E = {el, e,) is the odds bel(h, E ) = P(~~E) /P (&IE) , P being a standard subjective probability. The degree of belief in the causal rule (if h then e) is represented through the conditional likelihood ratio ~(elh)/P(el&). With that in mind, the Bayesian calculus is viewed as a mechanism designed to synthesize a set of causal degrees of belief into a combined posterior belief, The remainder of this section presents the derivation of this calculus. Similar derivations were carried out in numerous papers in decision theory and in AI, e.g., Peirce [42], Edwards et al. 1171, and Charniak [8]. A. Derivation of the Bayesian Calculus We begin with Bayes rule, applied to both P(hlel,e2) and p(&le1, e,): Dividing (4) by (5) gives the odds-ratio version of Bayes rule: Now, if the problem (h, el, e,) is ratio-form conditionally independent, (Grosof, [23]) (6) reduces to the following definition of the Bayesian calcdw: or, using a simplified notation, R(hlel,e2) = ~(e,lh).L(e,lh).R(h). (7) The e ~ n a t i o n of the joint distribution of evidence P(e,, e,) in the step from (4) and (5) to (6) has important practical implications First, the elicitation of any joint distribution function is a painstaking undertaking which should be avoided whenever possible. Second, (6) is completefy independent of the degree of uncertainty associated with the evidence E = {el, e,): if a clue e is uncertain, i.e., P(e) (1, this uncertainty is as relevant to h as it is to %, and therefore it can be canceled out. Finally, the elimination of P(el,e2) gives the knowledge engineer a wide choice of elicitation techniques, as any one of the following measures might be used to parameterize the evidential support that e renders to h: the pair of probabilities (P{e]h), ~ ( e ) % ) ) , the likelihood ratio ~ ( e l h ) / ~ ( e P ) , or the betting odds O(hie)/O(h) = (P(hle)/P(hle))/ ( ~ ( h ) / ~ ( 6 ) ) . The latter two expressions are equivalent, and any one of them can be derived from the former. This variety is important, as different applications and different experts might prefer one elicitation method over the other. We now turn to discuss some nonmathematical properties of rationality that are consistent with the Bayesian calculus (Cl). B. On the Underlying Rationality of the Bayesian Language The subjective4 school of probability (e.g., Ramsey [46]) is based on the argument that the semantic interpretation of probability can be given in terms of rational human judgment under uncertainty. Proponents of this school of thought argued vigorously and quite convincingly that any 3 hereafter denotes "not h." '~lso referred to as Bayesian or personal school of probability (e.g., Rarnsey [46D. XHOCKEN AND KLEINDOWER: ARTIFICIAL INTELLIGENCE DIALECTS OF BAYESIAN BELIEF REVISION LANGUAGE other interpretation of probability is merely a special case of the subjective philosophy (de Finetti 1131). As Savage [47, p. 671 puts it, "the personalistic view incorporates all the universally acceptable criteria for reasonableness in judgment known to me.. . when any criteria that may have been overlooked are brought forward, they will be welcomed into the personalistic view." Following Savages's line of thought, we wish to show that, given some plausible interpretations, important "dialects" of new belief languages are indeed special cases of the classical Bayesian language. The latter language is considered a gold standard because, normatively speaking, it is consistent with many intuitive principles of rational of these principles, which are all consistent with (CP), are discussed Mow, In wbaf f d o m , a human (or mechanized) agent who behaves in accordance with (Cl) is called a Bayesian judge. A human agent who entertains an abstract belief-update &el which may or may not be related to (CZ) Is called a b a n judge. Proper Synthesis of Degrees of Beliefi The tendency of human judges to underweight or ignore base-rate information is a well-known manifestation of the representativeness bias (Tversky and Kahneman 1571). For example, consider a candidate for a junior faculty position who already has several major publications under his belt. We argue W the fact that the candidate is a prolific Wn:m ( e ) might cause a human judge to overestimate Bis future tenure prospects (h). This overoptimism occurs either by misinterpreting ~ ( e ] h ) / ~ ( e l & ) to be ~ ( h l e ) / ~ ( h l e ) , or by letting a high and salient diagnostic impact ~ ( e l h ) / ~ ( e l & ) to overshadow the "dull" background information that only, say, 20 percent of aeW recruits are ultimately promoted to tenure. This bias will not distract a Bayesian yudge who adhere? to (Cl), where the base-rate information P(h)/P(h) is explicitly represented and carries tbie same weight as P ( e 1 h ) / ~ ( e 1 h ). Furehemore, (Cl) L both comutawve and associative, meaning that the evidential impact of clues is independent of order and clustering effects. This is in sharp contrast to human judgment that is prone to such belief synthesis biases as primacy effect (Anderson [2]) recency effect (Lopes [35]), misinterpretation of new evidence (Nisbett and Ross [37]), conservatism (Edwards [16]), and a host of other "averaging" rules that violate (Cl) (Slovic and Lichtenstein [54]). Fusion of Quantitative and Qualitative Euidence: The formation of a posterior belief in an uncertain hypothesis typically requires a joint consideration of factual information as well as subjective opinions. For example, consider the two propositions proiific researcher el and good teacher (e,). A Bayesian inference system that evaluates tmure prospects (h) will have lo use, among other things, the probabilities P(e,Jh ) and P(e2jh). Where do these numbers come from? The probability that a tenured professor is a prolific researcher can be obtained in a frequentist fashion, counting the number of prolific researchers among a known sample of tenured professors. Teaching ability ma) be a more elusive property, and the probability I P(e21h) can be elicited as a "personal" degree of belief, using a domain expert. According to the subjective school of probability, the scope of rational values that the frequentist P(ellh) and the subjective P(e21h) may attain is constrained by the same set of axioms. Due to this uniformity, the Bayesian language provides a homogeneous framework in which stochastic and epistemic degrees of belief are combined through the same calculus. Equal Attention to Positive and Negative Euidence: Human judges who are left to their own devices are known consistently to seek and overweigh supportive evidence at the expense of neglecting or underestimating negative clues (Koriat et al. [30]). At the same time, a knowledge engineer who is guided by (Cl) is forced to discount supportive belief of the form P(e1h) with its negative complementary belief ~ { e j & ) . Fw example, the impact of a good record of publications (e) on tenure prospects (h) is automatically discourired by the observation that good research is also produced by ommured candidates ( ~ ( e l x ) > 0). In other words, (Cl) forces the Bayesian judge to "dilute" his belief P(elh) with the complementary belief ~ ( e l h ) ; said otherwise, if (Cl) is transformed into a log-linear scoring formula, both P(e1h) and ~ ( e l z ) will have the same (unit) "coefficient of importance" in the resulting model. Cognitive &stor~om of the "dilution effect" were discussed by Nisktt et al. i381. The More Information, the Better: A close examination of (Cl) reveals that the Bayesian calculus is consistent with the commonly held principle that data (pieces of evidence) should never be thrown away. More precisely, Savage [47, p. 481 has shown that for n sufficiently large, the probability, given that h is true, that the likelihood-ratios product L(e,(h).; a , .L(e,lh) (as in (7)) is greater than any preassigned a m b e r is almost one, i.e., for 0 < c < ao, barring two banal exceptions. Our interpretation of this beautiful theorem is that the Bayesian judge becomes wiser as he goes along in his judicious search for relevant information: as more and more balanced and nonredundant evidence is brought to bear, the posterior belief in any hypothesis approaches certainty (through (Cl)), provided that this hypothesis is indeed true. It is trivial to show that this holds for any nonzero prior belief in the hypothesis; in that respect, the theorem also confirms that a rational person is a pragmatic learner. Regardless of how little belief s/he initially holds in an unpopular truth, e.g, that jogging is bad for one's health, s/he is willing to change that opinion freely as new information becomes available. Explicit Treatment of Conditional Independence: The Bayesian language offers a rich variety of tools for detecting, representing, and dealing with correlated evidence. For example, it is easy to show that any one of the following three assertions implies the other two: 1) P(ellh, e2) = P(e,lh); 2) P(e21h, el) = P(e21h): 3) P(el,e21h) = P(e,lh).P(e21h). The equivalence of 1) and 2) implies that the assertion "given h, knowing el does not change my belief in e," is symmetric with respect to el and e, as we would have expected. The equivalence of 1) and 3) implies that either 1) or 2) is consistent with the classical definition of statistical independence. Note, however, that either 1) or 2) is preferred to 3) from a cognitive standpoint: each conveys a better understanding of the notion of independence, and together they offer two cognitively different but mathematically equivalent techniques to elicit the same phenomenon. Hence Bayesian knowledge engineers who seek to detect correlated evidence might use 1 ) and 2) either separately or in tandem. This provides a richer language for knowledge acquisition and a simple means for cross-vcerification of human inputs. In the event that some piem of evidence are correlated (with respect to either h or to 6), (Cl) becomes invalid on normative grounds. Several authors, e.g., Charniak [8] and Pearl [40], pr heuristic te@&ques to correlated inference networks into equivalent (but not identical) networks in which some vwsdon of (C1) might be applied. As this line of research is quite new, these techniques are somewhat limited. At the same time, statistical dependency is a prevailing feature of nature that cannot be swept under the rug. It is therefore fortunate that the Bayesian language provides a powerful arsenal of tools for expressing &is phenomenon and dealing with it explicitly. Compatibility with Decision T k 0 9 : Aside from its compliance with rational judgment, the Bayesian language is syntactically compatible with many models that may be used in the context of expert systems. For example, Raiffa's [45] value-of-information analysis attempts to pursue the "most valuable" clue, i-e., the clue with the (anticipated) maximal diagnostic h p x t on current belief. In a similar vein, LangEstz el a!. f321 pow& a daision bhwretic extension to rule-based reasoning, in which utility eonsiderations serve to evaluate the merit of potential search paths in a rule-based inference net. There exist numerous other areas in which A1 and decision theory might benefit greatly from each other. To reap such benefits, though, both disciplines have to speak in the m e language. In this regard it is worth noting that practically a11 the work in prescriptive decision theory is. already cast in terms of the Bayesian language. Hence this language would be a pragmatic starting point for a comparative analysis. In summary, perhaps the reader is by now convinced that (Cl) has more to it than at first appears. From a technical standpoint, Bayes rule is a trivial exercise in set theory, given the axioms of subjective probability. These axioms, however, can be nontrivially derived from rational behavior under uncertainty (Cox [12], de Finetti [13], Savage [47]). Thus the epistemological interpretation of Bayes rule and its implications on judgment, learning, and experience. This dichotomy is clearly a manifestation of the Janus face of probability: "on the one side it is statistical, concerning itself with stochastic laws of chance processes. On the other side it is epistemological, dedicated to assessing degrees of belief in propositions quite devoid of statistical background" (Hacking [24, p. 121). We conclude this section with a formal definition of the (ratio-form conditionally independent) Bayesian language, which plays a central role in what follows. I Definition I : Let LBayes be the belief language whose syntax consists of likelihood ratios of the form P(elh)/ ~ ( e l h ) and whose calculus is (Cl). I Thus we focus on an inference context in which causal rules of the form h -, e are parameterized by likelihood ratios of the form el h) /~(e lg) . Furthermore, we assume that these degrees of belief are combined through Bayes rule (Cl). Since (Cl) is restricted to ratio-form conditionally independent inference problems, L,,,, is a special case of a more general Bayesian language. To avoid clutter, though, we will refer below to LBaye, as the Bayesian lmguage. We will comment later on the restrictiveness of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PhD Dissertation: Propositional Reasoning that Tracks Probabilistic Reasoning

Bayesians model one’s doxastic state by subjective probabilities. But in traditional epistemology, in logic-based artificial intelligence, and in everyday life, one’s doxastic state is usually expressed in a qualitative, binary way: either one accepts (believes) a proposition or one does not. What is the relationship between qualitative and probabilistic belief? I show that, besides the familia...

متن کامل

Information Retrieval, Imaging and Probabilistic Logic

Imaging is a class of non-Bayesian methods for the revision of probability density functions originally proposed as a semantics for conditional logic. Two of these revision functions, standard imaging and general imaging, have successfully been applied to modelling information retrieval by Crestani and van Rijsbergen. Due to the problematic nature of a\direct" implementation of imaging revision...

متن کامل

A Unified Model of Qualitative Belief Change: A Dynamical Systems Perspective

Belief revision and belief update have been proposed as two types of belief change serving different purposes, revision intended to capture changes in belief state reflecting new information about a static world, and update intended to capture changes of belief in response to a changing world. We argue that routine belief change involves elements of both and present a model of generalized updat...

متن کامل

Lost in Translation: Language Independence in Propositional Logic - Application to Belief Revision and Belief Merging

Despite the importance of propositional logic in artificial intelligence, the notion of language independence in the propositional setting (not to be confound with syntax independence) has not received much attention so far. In this paper, we define language independence for a propositional operator as robustness w.r.t. symbol translation. We provide a number of characterizations results for su...

متن کامل

Tuning Belief Revision for Coordination with Inconsistent Teammates

Coordination with an unknown human teammate is a notable challenge for cooperative agents. Behavior of human players in games with cooperating AI agents is often sub-optimal and inconsistent leading to choreographed and limited cooperative scenarios in games. This paper considers the difficulty of cooperating with a teammate whose goal and corresponding behavior change periodically. Previous wo...

متن کامل

TWO-DIMENSIONAL BELIEF CHANGE An Advertisement

In this paper I compare two different the models of two-dimensional belief change, namely ‘revision by comparison’ (Fermé and Rott, Artificial Intelligence 157, 2004) and ‘bounded revision’ (Rott, in Hommage à Wlodek, Uppsala 2007). These revision operations are two-dimensional in the sense that they take as arguments pairs consisting of an input sentence and a reference sentence. Two-dimension...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Systems, Man, and Cybernetics

دوره 19  شماره 

صفحات  -

تاریخ انتشار 1989